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ABSTRACT 

Radio transients are sporadic signals and their detection requires that the backends of 
radio telescopes be equipped with the appropriate hardware and software to undertake 
this. Observational programs to detect transients can be dedicated or they can piggy- 
back on observations made by other programs. It is the single-dish single-transient 
(non-periodical) mode which is considered in this paper. Because neither the width of 
a transient nor the time of its arrival is known, a sequential analysis in the form of a 
cumulative sum (cusum) algorithm is proposed here. Computer simulations and real 
observation data processing are included to demonstrate the performance of the cusum. 
The use of the Hough transform is here proposed for the purpose of non-coherent 
de-dispersion. It is possible that the detected transients could be radio frequency 
interferences (RFI) and a procedure is proposed here which can distinguish between 
celestial signals and man-made RFI. This procedure is based on an analysis of the 
statistical properties of the signals. 

Key words: miscellaneous - data analysis- statistical. 



1 INTRODUCTION 

Radio astronomy signals received by radio telescopes have 
noise-like waveforms which have a normal (Gaussian) prob- 
ability distribution function (pdf) A/"(0, cr s ), i. e., with zero 
mean and variance crj. Background radio emission and ra- 
dio receivers also produce normal noise A/"(0, cr sys ). Basically, 
radio astronomy observations consist of detecting and mea- 
suring (jg at the background of cr1 ys . This is valid for to- 
tal power radiometry (spatial distribution of signal noise 
power), spectrography (temporal coherence measurements) 
and interferometry (spatial coherence measurements). To- 
tal power radiometry is involved with the intensity variabil- 
ity of radio sources and, in particular, with sporadic phe- 
nomena - transient radio emissions. The time scale of radio 
transients can span nanoseconds to days. Traditional total 
power radiometers are not equipped with backends (both 
hardware and software) designed to detect transients. Many 
transients were found in a serendipitous way and the discov- 
eries of pulsars and RRAT are the most prominent examples 
of such discoveries. Systematic searching for non-periodical 
transients began only recently (in the last decennia) . Future 
radio telescopes (ATA, MWA, LOFAR, SKA) will be able to 
provide more oppo rtunities for the detecting of transients. 
(|Cordes et aLll2004l ). 

Several works dedicated to single radio transient detec- 
tion have been published. Interstellar scattering and scin- 
tillation, effective time resolution, de-dispersion methods, 
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matched filtering and thres holding have been considered in 
(|Cordes. fe McLaughlinll2003l ). 

A transient su rveys strategy and search processing have 
been studied ([Corded [2009). The importance of single- 
pulse detection vs. many-pulse detection for highly mod- 
ulated pulse trains is demon strated in this work and also in 
([McLaughlin feCor des 2003). The influence of the amplitude 
probability distribution of transients was studied in detail in 
this latter article. 

The search for transients consists of two main opera- 
tions: 

1. De-dispersion aimed at removing the effects of inter- 
stellar dispersion. There are two methods of de-dispersion: 

a) Coherent de-dispersion which is made before employing a 
total power detector, i.e., with "voltage" signals, shifted to 
the baseband frequency domain. As a rule, the signals are 
digitized and digitally processed. The processing performs 
phase rotation opposite to phase rotation undergone during 
propagation through interstellar media. Realization of the 
filter can be made in the frequency domain using FFT or 
in the time domain using finite-impulse response filter (FIR 
filter). 

b) Non-coherent (post-detection) de-dispersion operates af- 
ter total power detection. The total bandwidth of the re- 
ceived signal is divided on many narrow-band sub-bands as 
in a spectral analyzer and the voltage signals at the outputs 
of this partial filters are squared. These digitized "intensity" 
signals are time-shifted (with respect to each other) to com- 
pensate for the frequency-dependent delay produced by the 
interstellar media. 
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Both de- dispersion methods require some a priori knowledge 
of the dispersion measure (DM). This information about DM 
is usually not available in the search for radio transients. 
Therefore, when using either method of de-dispersion, sev- 
eral trials of DM must be performed. 

2. Duration matching of transients. 
To obtain the maximum possible signal-to-noise ratio which 
is necessary for reliable detection matched filtering must 
be performed: "intensity" signals must be correlated to the 
template of the expected transient. This is especially im- 
portant for weak pulses. Usually, neither the form nor even 
the duration of transient are known. The wide range of time 
scales of radio transients and the absence of information 
about their form requires the making of multiple trials in 
order to "guess" at least the duration of a transient. Inte- 
gration intervals of "intensity" signals are tuned to find the 
optimal interval which coincides with the duration of the 
transient. This is an iterative approximation to the matched 
filtering. 

Both DM trials and iterative matched filtering require 
considerable computational efforts, especially taking into 
consideration other search parameters: observational sky fre- 
quency and celestial coordinates. In the following sections 
an algorithm alleviating this computational burden will be 
proposed. 

Essentially the detection of transients is the detecting 
of abrupt changes in a 2 . This means that a permanent mon- 
itoring of a 2 at different time scales should be performed. 
The algorithm implementing this monitoring should detect 
strong and weak changes in amplitude and short and long 
changes in time. This task is a kin to the detection of c hange 
points in stochastic processes ([Basseville &; Nikiforovl ll993). 

Two problems arising in the single-dish single transient 
observational situation are considered in this paper: 

1) The amplitudes, time intervals (duration) and mo- 
ments of arrival of transients are not known. It is there- 
fore difficult to design a matched filter to detect transients 
with unknown parameters. The time scale of transients is 
very wide. Very strong transients, like Crab Giant Pulses or 
Jupiter radio bursts, can be detected with simple threshold 
techniques but weak and rare transients can be missed if not 
all possible parameter values are tried. This deficiency can 
be crucial in the detection of the non-periodical sporadic 
unique transients. In radio astronomy often the signals-of- 
interest a 2 << o- 2 sys and a considerable amount of raw data 
samples n » 1 are required in order to detect transients. 

Here a framework is proposed for transients detec- 
tion algorithms based on the method of cum ulative sums 
(|van Dobbenlll968l:lBasseville fe Nikiforovlll993h which was 
first proposed in ( Page! 19541 ) . Let the process under scrutiny 
produce observed data xi, X2, X3.... There is a parameter as- 
sociated with the process. The process is said to be "in con- 
trol" if the measured mean of the parameter is close to the 
target value. The process, of course, exhibits its own natural 
variability, for example, in the case of a Gaussian noise. Dif- 
ferences between the observations and the target value will 
always occur. It is necessary to distinguish between ran- 
dom variations and systematic deviations due to the pro- 
cess being "out of control". This situation often occurs in 
quality control in industrial production lines. In our case 
the parameter-of-interest is the variance a 2 . Page proposed 



continuous accumulation of the differences of the measured 
parameter and the target value - calculation of cumulative 
sum (cusum). When the running estimate of cusum is below 
a specified threshold, the process is judged to be "in con- 
trol". As soon as cusum exceeds the threshold the change 
point is detected and the "out of control" signal is trig- 
gered. Details of the cusum algorithm will be given later. 
The cusum test uses the combined information of any num- 
ber of observations, in fact, of all observations that have 
been obtained up to the time of testing. The numerical pro- 
cedure is very simple and is easily mapped on the computer 
instructions set. 

Cusum is a special type o f sequential probability ra- 
tio test (SPRT) developed by (jWaldl Il947h . Suppose it is 
necessary to discriminate between two hypotheses about a 
parameter-of-interest, the null hypothesis Ho being that the 
process variable is within tolerable limits and the alterna- 
tive hypothesis Hi being that the process variable is biased 
by a specified value. At each sampling the ratio of the like- 
lihood of obtaining the observed values under Hi and Ho, 
respectively, is calculated. If the ratio is large, the alterna- 
tive hypothesis Hi is accepted; if the ratio is small, the null 
hypothesis Ho is accepted; and if the ratio has an intermedi- 
ate value, the decision is delayed until further observations 
have produced either a higher or lower ratio of the likeli- 
hood. Sequential detection is optimal in the sense that it 
requires minimal number of observations to trigger a query 
(average run length), i.e., necessary for the detection of a 
parameter's change. 

SPRT and its modification - cusum - can be useful 
tools in the situation of uncertainty about the duration of a 
transient. Using the above-mentioned terminology, the phe- 
nomenon of a transient (the increase and decrease of a 2 ) 
can be described as the "out of control" and the "in con- 
trol" state, respectively. Continuous calculation of cusum of 
the observational data up to the point of change provides an 
adaptive behavior and obviates the requirement for duration 
trials. 

Another problems are RFI. 

2) Man-made radio frequency interferences (RFI) very 
often produce signals similar to those of natural transients 
and the algorithms of transients detection must include pro- 
cedures for distinguishing between RFI and cosmic signals- 
of-interest. On a level with the spatial-temporal check (when 
a transient must be simultaneously registered at the distant 
antennas pointed to the same radio source in the sky and 
taking into account geometrical delay), a control check at 
one radio telescope is proposed which allows a distinction to 
be made between the natural transient (Gaussian pdf, pure 
random noise) and the man-made transients (RFI with non- 
Gaussian pdf and non-random behaviour). 



2 THE DETECTION ALGORITHM 

Raw input data after amplification, filtering and digiti- 
zation are presented as a sequence of random numbers 
Xi,i = l..n, statistically independent and identically dis- 
tributed, having normal distribution with zero mean pk(x) = 
— K= exp(— 0.5<7fc) k = 0,1, where ao corresponds to the 
absence of a transient, <7i corresponds to the presence of a 
transient. The n numbers are stored in the buffer. Starting 
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from x r , 1 ^ r ^ n, pdf po is changed to the pdf pi. The 
task is to find this change of noise variance from ao to o\. 

As elsewhere in the detection theory there are two hy- 
potheses: Ho - the absence of a change of variance at the 
interval 1 ^ i ^ n and Hi - the presence of the change 
point x r inside the interval 1 ^ r ^ n. The probability 
that the volume of data x belongs to the case of Ho is 
equal to Ph = YYi=iP°( Xi )^ wnereas the probability that 
part of the samples Xi, 1 ^ i ^ r, belongs to Ho and the 
other part of Xi,r ^ i ^ n, belongs to H± is equal to 
^ H i,r — E[i=i Po( x i) YYi= r .Pi( x 0- Therefore, the ratio of 
these two probabilities (the likelihood ratio) is 

This value has its maximum at r if there is a change of 
variance in the data. 

2.1 Detection with a known change moment and 
duration 

Let us suppose that the change moment r and the duration 
of transient n — r — N are known, then the matched filter can 
be applied for the detection of such a transient. The indexes 
in sums in this subsection will span from 1 to N limiting 
only the length of the transient. A at is compared with the 
threshold A and if 

LL P0{Xi) 
i—1 

Hi is chosen, i. e., the change in a is detected. The value 
of threshold A is chosen to minimize two kind of errors: 
the error of the first type is to give a false alarm, i.e., to 
make the decision in favor of Hi when Ho is valid, and the 
error of the second type is to miss the change point, i.e., 
to make the decision in favor of Ho when Hi is valid. The 
error probability of the first type is denoted by a and that 
of the second type is denoted by f3. The value 1 — j3 is the 
probability of detection. One of the conventional ways of 
choosing An is to maximize 1 — f3 for a given a (Neyman- 
Pearson criterion, (Whalen 197lh). 
Logarithm of J2} gives: 

N N N 

ln(A N ) = ln(T] ^M) = V IntpiO*)] " Y\ ^\p ( Xi )] = 

i—1 i=l i—1 

N 2 2 N 

i—1 i—1 

or: 

x * i ln(A) - In ^ 

— t > — — ^- (4) 

The left part of (|4|) corresponds to the usual radiomet- 
ric output (the averaged sum of the sample's squares), or 
the energy detector when the moment and duration of the 
transient are known. The threshold in the right part of 
J4} depends on ao and <ji, but for the signal increment 
Act 2 — o-\ — ao « o~o and large N » 1 which is the typical 
case in radio astronomy, we can compare xf with the 



value h = o~o +fc>0"o \ZV^' wnere ^ ne coefficient fco is chosen 
to satisfy the probability of false alarms a to be equal to the 
prescribed value. For example, for a — 0.05, ko = 1.645, for 
a 0.01, k 2.33, for a 0.001, k 3.09. 
The probability of detection for the given threshold h and 
the signal+system noise variance o\ is 

P det (h, ai) = 0.5 - 0.5er/(-^— ^L), (5) 

2crg v O/iV 

where erf(x) — J Q X exp(— t 2 )dt. 

Fig. [1] shows the probabilities of detecting transients: 
the top figure corresponds to a = 0.01 and the duration 
(the number of transients samples N) is the parameter, 
N = 10 3 , N = 10 4 and N = 10 5 , the horizontal axis is the 
transient amplitude Act 2 = g\ — ctq ; the middle figure shows 
the probabilities of detection 1 — f3 when a is the parameter 
and N = 10 3 ; the lower fi gure demonstrates the dependence 
of 1 — f3 of the transient duration A/" when cri = 1.0245 and 
a is the parameter. 

All these curves show the best possible outcome of the 
testing of the hypothesis when the moment of arrival of the 
transient and its duration are known. In real life this is not 
the case, and a bank of matched filters is necessary, each 
tuned to the particular r and N in the range of expected 
values. These trials on r and Af correspond to the general- 
ized likelihood ratio test which are additional computational 
burdens on the inevitable trials on the dispersion measure. 

Choosing the wrong A" may significantly worsen the 
probability of detection. Let us consider the situation when 
the "rectangular" signal with the amplitude a and duration 
number Af of samples must be detected at the background 
of noise with rms—a. For the known A", the signal-to- noise 
ratio after the averaging of A" samples of the mixture "sig- 
nal+noise" is snrN — 

For Ni > N the averaged signal is and the rms of 

the noise is a } . Therefore, the signal-to- noise ratio is 

snr Nl - 

For Ni < N the signal-to-noise ratio is snr n ± = — ^ — • The 
ratio -j^^ shown in Fig. [2] as the function of fevi = Ni/N 
gives an impression about losses due to the mismatch in the 
duration of the impulse. The decrease in the number of de- 
tectable radio sources is proportional to /c 3 ^ 4 for kN 1 < 1 

and to k^ 4 for /cat x > 1. 

In order to deal with the more realistic situation of un- 
known r and A", we will now consider another approach. 

2.2 Sequential analysis 

Until now the test with a fixed sample size A" (transient 

duration) has been considered. The sequential probability 

ratio test does not require the sample size to be fixed a 

priori but instead us es the data available at each particular 

moment, (|Waldlll947T ). The main idea of SPRT is to compare 

the likelihood ratio calculated for each i = l..n with two 

thresholds: 

accept Ho if A n ^ B, 

accept Hi if A n ^ A, 

continue to observe if B ^ A n ^ A. 

The thresholds A and B are chosen as 
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Figure 1. Probabilities of detection of noise-like transient with 
normal pdf and known duration and time of arrivel, the up- 
per figure: probability of false alarms a = 0.01, and the du- 
ration (the number of transients samples TV) is the parameter, 
TV = 10 3 , TV = 10 4 and TV = 10 5 , horizontal axis is the transient 
amplitude Act 2 = a\ — cr 2 ,; the middle figure: probabilities of de- 
tection when a is the parameter and TV = 10 3 ; the lower figure: 
probabilities of detection as functions of the number of samples 
TV and o\ = 1.0245 and a is the parameter. 



Figure 2. The relative snr due to the mismatch in the number 
of samples TVi detecting the transient which duration interval 
corresponds to TV samples. 



where a and f3 are the error probabilities of the first and 
second type, respectively, see section 2.1. 
Fig. [3] (the upper panel) shows the probabilities of detection 
and average decision time for SPRT tuned for a — 10 -3 
and f3 = 10 -3 and for matched detection: the upper panel 
- the probability of detection of a noise-like signal for the 
sequential test (solid line) and the duration-matched test 
(dotted line) as functions of input variance (ai) 2 ,ao = 1.0, 
a — 10 -3 , the number of the samples of the signal TV = 7200. 
There is no big difference between these two curves, whereas 
the middle panel shows the average decision time for the se- 
quential test (solid line) as the function of the input variance, 
the dotted line - number of signal samples (TV = 7200). 
There are three regions in the middle panel of Fig. [3] weak 
signals (left side), strong signals (right side) and intermedi- 
ate signals. The average number of samples which is neces- 
sary to make the decision is less than that of the matched 
detector for weak and strong signals and higher in the inter- 
mediate case. In the area of interest where the probability of 
detection is close to 1 the gain in the decision time for SPRT 
is obvious. Therefore, the number of observations before a 
decision is variable and depends on the observational situa- 
tion. The mean of this random value is the average sample 
number. 

The operative characteristic introduced by Wald for 
SPRT is a useful performance parameter. In our case it is 
the probability of accepting hypothesis Ho as a function of 
(ai) 2 : L(o~l). When <j\ — ctq, i. e., in the absence of signal, 
1 — L(ctq) = a (probability of false alarms). In the presence 
of signal, g\ > ao,L(cr 2 ) = P (probability of missing the 
signal) and 1 — L{a\) — Pdet (probability of signal detec- 
tion). It is visible from the lower panel of Fig. [3] that the 
detector provides a for g\ — 1.0 and f3 for g\ — 1.103 both 
equal to 10 -3 as was planned and the quality of detection 
improves with the growth of the signal. The average deci- 
sion time in the middle panel also reduces with the signal's 
amplitude. Therefore, if SPRT is tuned for <j\ which is cho- 
sen for the minimal expected signal, no deviation a 2 > <j\ 
is detrimental. 
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Figure 3. The upper panel: probability of detection of the 
noise-like signal for the sequential test (solid line) and duration- 
matched test (dotted line) as functions of input variance 
(cri) 2 ,cro = 1-0, a = 10 -3 , the number of the samples of the 
signal TV = 7200; the middle panel: average decision time for the 
sequential test (solid line) as function of input variance, the dot- 
ted line - number of signal samples; the lower panel: operative 
characteristic of SPRT, cr 2 = 1.103, a = lO" 3 , (3 = 10~ 3 . 



2.3 Cumulative sum method 

The next improvement of the detection algorithm is the fol- 
lowing modification of SPRT. Let us return to the expression 
(3) and denote 



2H\ 



\ 2 2 
) OO^I 



The sum in (3) is rewritten in the following sequence of 
recursive cumulative sums: 



s r = 

k, i = 1. 







(8) 



The behavior of S% is shown in Fig. [H 

Fig. [4^l shows n = 10 4 samples of Gaussian noise. Sam- 
ples numbered under r = 4000 have variance ctq = 1.0 and 
samples numbered above r — 4000 have variance g\ — 1.49. 
Fig.UJ) shows the cumulative sum (cusum) Sl,i = n..l, i.e., 
the calculation of cusum goes from the end of the data block 
to the beginning. The change point sample r can be found 
as 



rw - arg max (Si). 

l<i<n 



(9) 



The order of summation can be reversed: from the beginning 
to the end, i.e., to calculate cusum S%,i = l..n. In this case 
cusum looks as in Fig. and the change point sample F can 
be found as 



Tfwd 



arg min (Si). 



(10) 



The change point detection rule is the comparison of 
SI , i = l..n with the two thresholds ho and hi : when SI < ho 
the hypothesis Ho is accepted and when S™ > hi the hy- 
pothesis Hi is accepted. The intermediate values of Si dic- 
tate the continuation of testing (cusu m calculatio n). This is 
the philosophy of sequential analysis (|Waldlll947l ). 
Page proposed restarting the algorithm as long as the pre- 
viously taken decision is H p and also proposed making the 
lower threshold hn — 0. (|Pagd [l954h . It was shown later 
(jShirvaevlll96ll ; lLordenlll97lh that this is the optimal value 
for ho. Taking this into consideration, the change point de- 
tection rule can be modified in the following manner: 



?=mm{r:(S r 1 ) + ^h 1 }, 
where 



max 



(o,sn 



(ID 



(12) 



(SI) 

Fig. 2JI shows this modified cusum with the visible 
change point at i — 4000. The large amplitude of change 
has been chosen deliberately to better illustrate the behav- 
ior of cusum . 

If a change does not exactly correspond to ai , the algo- 
rithm is not optimal. In radio astronomy practice, the value 
of ai in can be chosen as a minimum expected change of 
variance which basically can be estimated from the knowl- 
edge of the system parameters: the antenna's effective area, 
the system temperature. For values larger than <ji the de- 
tection procedure is not optimal but is acceptable, because 
the growth of the change value yields a reduction in ARL 
and the advantage of an optimal procedure is not great. 

The detection performance of cusum is characterized by 
the probability of false alarm. The second kind of error /3 is 
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Figure 4. a) input data, n = 10 4 , variance before change 
(Jq = 1.0, variance after change at point r = 4000, a\ = 1.49; 
b) backward cumulative sum S™] c) forward cumulative sum S^; 
d) modified forward cusum. 



not considered in the cusum analysis because hypothesis Ho 
is never accepted: each time the cusum is equal to zero, the 
algorithm restarts. The cusum test will eventually reject the 
hypothesis Ho. The number of samples up to this rejection 
of Ho is also a random variable, as in SPRT, and is called 
the run length. Similarly to SPRT, the following parame- 
ter is introduced for cusum: the average number of samples 
from the starting point up to the point at which the deci- 
sion threshold h = hi is crossed is called the average run 
length, ARL. The ARL, the average sample number n and 
the operative characteristic L are related by 



ARL(a ,(Ti) 



n(ao,cri) 



1 



(13) 



L(ao,ai) 

A theoretical calculation of ARL is a complicated proce- 
dure. Details of these calculations an d tables with usefu l 
practical results can b e fou nd in |yan Dobben 1968), 
dBasseville fc Nikiforovl Il993h and (Ha wkins fe Olwelll 
ll998T ). Here we are interested in comparing cusum per- 
formance with the matched detector and the computer 
simulation was performed for the purpose of demonstration. 



2.4 Computer simulation 



The following computer simulations were made to compare 
cusum with width-matched detection. The number of sam- 
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pies of noise with normal pdf A/"(0,a) is N = 10 4 . In the 
absence of signal a — 1.0 for all N samples. A change of 
a > 1.0 must be detected. The width-matched algorithm in- 
tegrates all N samples and the decision about the presence 
of a signal (any change in a > 1.0) is made when the esti- 
mate of variance exceeds the threshold h = cfq -\-koaoy^2/N, 
where ao = 1.0 and ko = 3.09 is chosen to keep the proba- 
blity of false alarm at a = 10 -3 . 

The cusum algorithm (|lip and (|12p is tuned with ao = 
1.0 and <ti = 1.05. The threshold hi was also chosen to keep 
a = 10 -3 . Both tests (matched detection and cusum detec- 
tion) were repeated 100 times. 

Averaged results are given in Fig. \5\ The upper panel shows 
that the probabilities of detection as functions of the signal 
increment for cusum are slightly lower in the region of un- 
certain detection and are close to 1.0 at the same value as 
for the width-matched detector. 

The lower panel shows the dependence of the average run 
length on the signal power. The number of samples which 
are necessary for signal detection decreases and at the point 
of Pdet ~ 1.0 is four times less than N = 10 4 . ARL contin- 
ues to decrease further with the growth of the signal, while 
N remains constant being preset for the matched detector. 
This is the important property of cusum. 
It must also be mentioned that the c omparison of different 
types of transient detectors made in (|Wang fc Willetl feoOO) 
showed the advantage of using cusum. 



2.5 Examples from observations 

The pulsar machine PUMA-2 installed at WSRT allows raw 
data to be stored in the 20MHz bandwidth. The radio tele- 
scope works in tied-array mode in which all 14 signals from 
antennas are added in phase, i.e., there is one output as for a 
single dish. The 20MHz baseband signals are digitized (8bit, 
4 • 10 7 samples/sec) and stored in the mass storage system 
which has sufficient capacity to support 24 h of continuous 
observations. Signal processing can therefore be undertaken 
off-line. 

A block of data corresponding to « lOsec observation 
of pulsar B0329+54 at the sky frequency 1420 MHz is used 
here for demonstrating transient detection. Fig. \6\ shows the 
sequence of pulsar impulses: total power detector and inte- 
gration at 5 • 10 _4 sec (20000 samples). The raw data corre- 
sponding to the time interval from 0.7 sec till 0.77 sec (an 
area around the second impulse in Fig. [6]) is given by the 
total number of samples 2.8 • 10 6 which is too large to be 
represented in one figure. Therefore, only one sample from 
each 50 is shown in Fig. [3 (upper panel), i.e., there is dec- 
imation equivalent to a 50 times reduction of the effective 
bandwidth: to 0.4MHz. 

The cumulative sum calculated with these decimated sam- 
ples is shown in Fig. (lower panel) and clearly indicates 
the pulsar impulse, with no assumption made about its du- 
ration. 

Fig. [8] shows another way of using cusum. The output of the 
total power detector (TPD) with integration over 50 sam- 
ples is shown in the upper panel of Fig. [8j This waveform 
corresponds to the second and third impulses in Fig. [6] (from 
0.7sec till 1.9sec). Insufficient integration reveals some small 
increase in noise variance at the positions of impulses. The 
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Figure 6. Impulses of pulsar B0329+54 observed at WSRT with 
PUMA-2. Sky frequency^ 1420MHz, bandwidth=20MHz, inte- 
gration time=5 • 10 — 4 sec. 
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Figure 7. The upper panel: raw data corresponding to the area 
of the second impulse in Fig. [6] The lower panel: cusum calculated 
with the samples of raw data shown in the upper panel. 



lower panel shows the cumulative sum calculated using the 
data represented in the upper panel of Fig. [8] And again, no 
assumption was made about the duration of impulses. The 
ability to detect these two transients is clear. 
Looking at the lower panel of Fig. [8] one could imagine that 
this waveform could equally well be obtained with a smooth- 
ing algorithm or with a low-pass digital filtering procedure. 
But the principal advantage of cusum is the freedom from 
any choice of matching integration parameters: the smooth- 
ing interval or the cut-off frequency of the digital filter. 

Until now the one-dimensional approach was consid- 
ered. The usual practice includes the spectral analysis of 
the data with FFT or filter banks. It means that there 
is a two-dimensional array: a time-frequency plane. This 
multi-narrow-band filtering is useful for flagging of RFI 
and de-dispersion. In principle, cusum can be calculated 
in each frequency channel, if an expected transient is 
strong enough. Otherwise, the channels free from RFI after 
de- dispersion can be averaged and analyzed with cusum 
algorithm. 
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Figure 8. The upper panel: TPD data corresponding to the 
area around the second and third impulses in Fig. [6] integra- 
tion time=1.25- 10 — Q sec. The lower panel: cusum calculated with 
samples of the data shown in the upper panel. 



2.6 Transient detection in the presence of 
dispersion 

Transients are broadened by propagation effects and their 
amplitude decreases. To compensate this effect trials of dif- 
ferent values of dispersion measures are usually made in or- 
der to concentrate all possible pulse energy into one narrow 
burst. After each DM trial the time integration (duration 
matching) is performed to detect the transient. Cusum can 
be used at this stage to reduce the number of trials of tran- 
sient's durations. 

In the ideal case of phase-only distortions during propa- 
gation, the total energies of the broadened pulse and the ini- 
tial narrow pulse (without dispersion) are equal. When the 
duration Wi of a pulse is known and its amplitude is equal 
to AMPi the probabilit y of detection can be estimated by 
the well-known formula, Whalen (1971): 



\/E/N 



exp(—u 2 /2)du, 



(14) 



where E — AMPfWi is the pulse energy and No = 



is the spectral density of noise, Af is the 



bandwidth of noise, r cor is the noise correlation interval. We 
see that the probability Pdet depends only on the param- 
eter q = E/No = ^ . 11 the energy is the same 

°" T cor 

both for broad and narrow pulses then Pdet will also remain 
unchanged. For example, the propagation effects result in 
broadening the pulse from Wi to mWi and the amplitude 
is reduced to AMPi/^/m (to keep energy E constant) and 

, (AMP 2 /m)xmWi 

then we get % — 



= q- 

Taking this into consideration, it can be conjectured that 
the cusum algorithm may detect a pulse broadened due to 
moderate dispersion. Only detection is considered here. Ex- 
act reproduction of the transients's form, of course, requires 
the knowledge of DM and de-dispersion. 
However, conventional practice in the search for transients 
includes de-dispersion before time integration. 

Here we consider another effective method of non- 
coherent de-dispersion: the application of the Hough trans- 
form (HT) which is widely used in image processing for 



line detection, see iDuda fe Hartl (|l973h and the survey of 
llllingworth &; Kittled (|l988h . A transient's track without de- 
dispersion on the time- frequency r—f plane is the second or- 
der curve which can often be approximated by a straight line 
if the bandwidth is sufficiently small. Here we show how HT 
can be used to detect a straight line on the time-frequency 
r — f plane. Details of HT and example of computer simu- 
laten are given in Appendix A. 

The following example will demonstrate the application of 
HT to LOFAR observational data. Fig. [9] represents the 
three-dimensional image of the r — f plane with the pul- 
sar B0329+54 impulses, 800 frequency channels, each chan- 
nel's bandwidth is Sf = 0.01220703125Mif2, time sample 
interval 8t — 0.00131072s. The left channel frequency is 
fo = 175.29296875Mifz. Several RFI are also visible on 
the left and right side of the image. 

The 800 x 800 pixels fragment shown in Fig. [10] around the 
8.0s area is chosen for processing in the following way. The 
threshold equal to « mean + rms of the data's values is ap- 
plied to the array of numbers represented in Fig. [TO] and the 
corresponding binary image is shown in Fig. 1111 This binary 
image is then Hough-transformed and the result is shown in 

Fig. mi 

There are two distinct peaks on the left side of this fig- 
ure corresponding to the two pulsar's tracks in Fig. [TUl and 
Fig. [11] The angular coordinate of these peaks which is 
the slope of the pulsar's tracks is the same and is equal 
to = —69.75°. The module of this value can be recalcu- 
lated to the dispersion measure: DM — 26.8 which is not far 
from the tabulated value for this pulsar (26.7). The distances 
from the center of the initial image to the straight lines are 
measured by another coordinate p. So, HT gives the value 
of DM which can be used in delay alignment between fre- 
quency channels, similar to non-coherent de-dispersion. 
There are several other peaks in the center of Fig. [12] corre- 
sponding to RFI, but their angle coordinates ~ because 
the tracks of these narrow-band RFI are perpendicular to 
the frequency axis (no dispersion). This is a convenient way 
to distinguish the transient's tracks with nonzero DM from 
RFI on the r — f plane. 

When a transient's track on the r — f plane cannot be 
approximated by the straight line, HT can be modified to 
detect the second order curve. There are many v ersions of 
HT adapted to detection of particular curves, see iRao fe Lil 
(|l992h . 

The additional useful property of the HT is the accumu- 
lating of a number of points belonging to the same straight 
line, which is, in essence, averaging. Having the slope of the 
track of a transient on the r — f plane it is not only possible 
to estimate its dispersion measure with HT , but also to de- 
tect a weak transient using this averaging property of HT. 
The computer simulation example in Appendix A shows that 
even for a weak signal (a line on the time-frequency plane) 
the distinct peak can be detected after HT. 

The cusum algorithm can also be applied to data on the 
p — Hough plane. Tracks belonging to the same transient 
form the set of parallel adjacent lines or the strip. The width 
of the strip is proportional to the duration of the transient. 
After HT this strip will be represented by the vertical set of 
adjacent points on the p—0 plane: coordinate is determined 
by the slope of the strip and the range of coordinate p is 
proportional to the width of the strip or the duration of the 
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Figure 9. Three-dimensional presentation of the pulsar 
B0329+54 impulses, LOFAR observational data, 800 frequency 
channels, left channel frequency is fo = 175.29296875Mifz, chan- 
nel's bandwidth is Sf = 0.01220703125M#z, time sample inter- 
val is St = 0.001310725. RFI are also visible. 
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Figure 10. Pulsar tracks from Fig. [9] chosen for the dispersion 
measure analysis. 

transient. Therefore, cusum can be calculated along the axis 
p for each in the same way as it is made on the r — f plane 
and weak transients will be better detected. 



3 STATISTICAL TESTS AFTER CHANGE 
DETECTION 

The detected signal may be of natural (celestial) origin or 
be RFI. To distinguish between these two hypotheses some 
statistical tests can be applied to the input data. Three of 
them were studied here: 

1. The Wald-Wolfowitz (W- W) test (runs t est) which checks 
the randomness of the data (Bradley 1968); 

2. The single-sample Kolmogorov-Smirnov (K - S) test of the 
goodness- of-fit to the norm al cumulative distribution func- 
tion fcdf ^Eadie et alJll97lh : 

3. The Jarque&Bera (J-B) goo dness-of-fit test using sample 
skewness and sample kurtosis (jjaraue fe Beralll980h . 
Stationarity of the random process "inside" a transient is 
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Figure 11. Binary 800 x 800 image corresponding to Fig. 1101 



After Hough transfform 




-400 angle 
Figure 12. Hough transform image of Fig. 1111 

assumed. The test data were divided into 2 groups: the 
first group - noise with normal pdf J\f(0,a sys ), and the sec- 
ond group - noise with normal pdf A/"(0, a sys ) plus BPSK 
(binary phase-shift keying) signal which was modelled by 
Si=amp x sin(27rFz + 4>i),4>i = 4>i-i + tv x psn(X), where 
F — 0.1 and psn(X) is the random value with Poisson pdf, 
parameter A = 0.1. The amplitude amp is the parameter 
in the following tests. All three statistical tests were tuned 
on the significance level (false alarm probability) a = 0.05. 
Each test was repeated Ni = 100 times with the data (num- 
ber of samples M = 10 4 and M = 10 3 ) corresponding to 
amp = 0, no change, and then N 2 = 100 times for crj > 
(noise-like signal) or amp > (RFI), i.e., in the case of 
change. 

The results of the first part (system noise + noise-like signal 
with normal pdf) are similar for all three tests: the prob- 
abilities of a wrong decision (the signal is non-random or 
non-Gaussian) are at the level of 0.05, i.e., at the level of 
false alarms. 

The results of the second part of testing are shown in Fig. 
ITTTI The upper figure shows the probabilities of correct de- 
cision as functions of the input RFI ('signal')-to- noise-ratio 
snr — {amp 2 /2) / a1y S for three tests. The number of sam- 
ples is equal to M — 10 4 . 
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Figure 13. Results of statistical testing of data (signal or RFI) 
for methods: Wald-Wolfowitz (runs test, W-W), Kolmogorov- 
Smirnov (K-S) and JarqueBera (J-B). Upper panel: the number of 
samples M = 10 4 , lower panel: the number of samples M = 10 3 . 



Table 1. Results of statistical testing of data (signal or RFI) for 
the Kolmogorov-Smirnov test in the case of exponential pdf for 
two numbers of samples M = 10 4 and M = 10 3 . 



snr 


M = 10 4 


M = 10 


0.00 


0.05 


0.05 


0.01 


0.17 


0.14 


0.02 


0.49 


0.16 


0.03 


0.78 


0.20 


0.04 


0.92 


0.24 


0.05 


1.0 


0.30 


0.06 


1.0 


0.44 


0.07 


1.0 


0.62 


0.08 


1.0 


0.69 


0.09 


1.0 


0.77 


0.10 


1.0 


0.81 


0.15 


1.0 


0.97 


0.20 


1.0 


1.0 



The lower figure shows the probabilities of correct decision as 
functions of the input RFI ('signal')-to- noise-ratio for these 
tests when the number of samples is equal to M = 10 3 . 

The main conclusion is that the runs test and 
Kolmogorov-Smirnov test detect signals with non-normal 
pdf earlier than the Jarque&Bera test. The K-S test uses 
the whole empirical pdf for comparison with the normal pdf, 
whereas the J-B test uses empirical kurtosis (the modeled 
RFI does not alter the skewness), which has considerable 
sample variance: « 24/M for normal pdf. Therefore, weak 
non- Gaussian transients can be better detected by W-W and 
K-S tests. 

The Kolmogorov-Smirnov test was also applied in the 
case of exponential pdf which is the pdf of the power spec- 
trum at each frequency in the absence of RFI. Any sig- 
nificant deviation from exponential pdf at a particular fre- 
quency indicates that the signal at this frequ ency does no t 
correspond to pure noise with a normal pdf (|Fridmanll200l[ ) , 
i.e., can contain RFI. Table 1 shows the results of statistical 
tests similar to those shown in Fig. 1131 but only for the K-S 
test. 

A comment must be made here. In practice, even system 
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Figure 14. Waveforms of noise in the reference channel (upper 
panel) and in the channel with RFI (lower panel). 
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Figure 15. Empirical cumulative distribution functions of noise 
in Fig. 1141 in the reference channel (dash line) and in the channel 
with RFI (solid line). 
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Figure 16. Waveforms of noise in the reference channel (upper 
panel) and in the channel with RFI (lower panel). 
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Figure 17. Empirical cumulative distribution functions of noise 
in Fig. 1161 in the reference channel (dash line) and in the channel 
with RFI (solid line). 



noise rarely has ideal normal pdf. There are always slight 
deviations due to digitization and digital filtering. The de- 
viations which are visible in histograms are also easily de- 
tected by statistical tests. This pdf is usually stable enough 
to be used as a reference function for the two-sample K-S 
test, etc., and changes in the distribution function produced 
by RFI must be compared with this reference system noise 
non-normal pdf. 

Figures IT4l - 1 1 71 illustrate the application of these statis- 
tical tests to real observational data. Signals at the LOFAR 
pulsar backend were used. There are four 992-channel fil- 
ter banks, each frequency channel having the band- width 
equal to ~ YlflKHz. Some of these channels contain RFI. 
The channel #029 (the central frequency^ 139.3066 MHz) 
without RFI was chosen as the reference channel (see the 
waveform of 10 4 samples of noise in this channel in Fig. 1141 
the upper panel). The waveform of the channel #018 (the 
central frequency = 139. 1724 MHz) with RFI is shown in the 
lower panel of Fig. 1141 All above-mentioned tests show non- 
Gaussian and non-random (runs test) behaviour of noise in 
channel #018. 

The two-sample Kolmogorov-Smirnov test was also applied 
to distinguish the difference in the empirical pdf of the noise 
in the reference channel and in the channel with RFI. The 
Gaussian property was not tested here, only the deviation 
from the reference pdf. Fig. [15] shows these two empirical 
pdf. The distinction between the two curves is clearly visi- 
ble. 

Figures [16] and [17] show another example: the refer- 
ence channel being #500, (the central frequency^ 181. 3965 
MHz) and the channel with RFI #512, (the central fre- 
quency^ 181. 5430 MHz). In this case too all tests indicate 
the presence of RFI. 

The RFI detection algorithms described in this section 
must be " switched on" after the detection of a transient for 
the purpose of diagnostics, i. e., to reject RFI transients. 
So, the computational burden is not very large. The com- 
putational complexity for W-W and K-S is proportional to 
nlog(n\ n is the number of samples. The J-B test includes 
calculation of statistical moments and the time of comple- 
tion is proportional to n. 



All known and relevant methods of RFI mitigation can be 
used during observations but I have here considered only 
the case of when a transient has been detected and it is 
necessary to distinguish it from RFI. 



4 CONCLUSIONS 

1. Searches in several dimensions (time, frequency, disper- 
sion measure) are a considerable computational burden dur- 
ing transients detection observations and processing. The 
cumulative sum method which is the modification of Wald's 
sequential analysis can be helpful in this situation. Com- 
puter simulation and the processing of real observational 
data show that the time-consuming search into the duration 
of a transient (duration-matching) can be reduced using the 
cusum algorithm. Cusum can be used both after coherent 
and non-coherent de-dispersion. The number of computer 
instructions required by cusum grows linearly with the num- 
ber of data samples n. The benefit of using cusum is pro- 
portional to the number of trials which would be undertaken 
with the conventional iterative matching procedure. One of 
the trial procedures in the search for transients can therefore 
be eliminated. 

2. The number of trials during non-coherent de- 
dispersion can be reduced using the Hough transform which 
measures the slopes of the transient's tracks on the time- 
frequency plane and, as a result, gives estimates of DM. 
The averaging property of HT contributes to the detection 
of weak transients. The combination of cusum algorithm 
and HT provides both an estimation of DM and a dura- 
tion matching. 

Due to parallelism and the "binary" character of HT many 
fast HT algorithms were developed in recent years to be 
implemented in FPGA and multi-core computing systems. 
Cusum can also be easily mapped on FPGA. In the situa- 
tion of the a priori uncertainty both of DM and duration 
this property promises to be useful, including in real-time 
searches for transients. 

3. Cusum is also a good tool for RFI detection ([Fridmanl 

1996). Burst-like RFI, erroneously identified as natural tran- 
sients, can be distinguished from the noise-like signal-of- 
interest by the difference in statistics: as a rule, RFI are non- 
random and non- Gaussian. The proposed statistical tests al- 
low RFI and transients of natural origin to be distinguished 
from one another. The tests are based on the statistical 
analysis of the random numbers representing a transient. 
They are recommended to be used after the detection of a 
transient and require moderate computational efforts: only 
data suspected as being that of a transient are analyzed 
and the number of computer instructions is proportional to 
~ nlog(n). 

There are, of course, many other methods of RFI mitigation 
which can be applied during the entire observation in order 
to prevent false alarms. Everything depends on the type of 
RFI which is hindering observations. 

4. The methods proposed in this article do not exclude 
the use of the most powerful likelihood criterion of the tran- 
sient's celestial origin: simultaneous observations of the tran- 
sient signal coming from the same area of the sky to several 
radio telescopes which are far distant from each other. 
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APPENDIX A: THE HOUGH TRANSFORM AS 
A TOOL FOR THE DETECTION OF A 
DISPERSED TRANSIENT 

The Hough Transform (HT) is a useful algorithm for straight 
line detection in binary images when amplitudes of pixels are 
equal to two numbers, for example, 1 or 0. Each straight line 
y — ax + b on a plane can be parameterized by the angle 
of its normal to the horizontal axis and its distance p from 
the origin of coordinates. The equation of a line is 

y = -x^ + . (Al) 
sin ^ sin ^ 



This equation can be rewritten for p as a function of (x,y): 

p = xcos{6) + ysin{6). (A2) 

A line can then be transformed into a single point in the pa- 
rameter space (p, 0) which is called the Hough space. For any 
pixel in the image with position (x,y), an infinite number 
of lines can go through that single pixel. By using equation 
(A2) all pixels belonging to the line can be transformed into 
the Hough space. A pixel is transformed into a sinusoidal 
curve that is unique for this pixel. Doing the same transfor- 
mation for another pixel gives another curve that intersects 
the first curve at one point in the Hough space. This point 
represents the straight line in the image space that goes 
through both pixels. This operation is repeated for all pix- 
els of the image. The pixels belonging to the same straight 
line have the same point of intersection in the Hough space. 
For each point in the Hough space the HT program as- 
signs a counter which accumulates the number of these in- 
tersections. So if there is a straight line with the parameters 
(pi,0j) consisting of N pixels the counter corresponding to 
the point in the Hough space will contain number N. 

This is correct only for the ideal case when there is no noise 
in the image. In the case of a noisy image the situation is 
the following. 

Let the time- frequency r — f plane being the image on which 
the search for transients is performed. Each line of pixels 
along the time axis represents "intensity" samples at the i- 
th output of FFT or a filter bank. 

In the absence of transients, these samples are random num- 
bers - often with a Gaussian pdf due to preliminary aver- 
aging. A binary image is created using the threshold thr 
equal to thr « a + m where a is the rms of the noise sam- 
ples and m is the mean value. If the amplitude of a sample 
is less than thr it is converted to 0, otherwise it is equal 
to 1. Now the binary image is covered with the random 
and 1 and the probability p of 1 is equal to « 0.16. For the 
size of the image M x M the number of l's is ~ p X M 2 . 
It is difficult to calculate precisely noise pdf on the (p, 0) 
plane. So the approximate estimates of the first two mo- 
ments, confirmed by co mputer sim ulation, are given here: 
mean^ p x M . rms~ \/p(l — p)M. 

The image on the r — f plane of a transient after disper- 
sion due to wave propagation in the interstellar media is a 
second-order curve line. In a narrow range of frequencies the 
curve can be approximated as a straight line. The width of 
the band for which this approximation is valid depends on 
the observational sky frequency. 

To better understand the HT a computer simulation exam- 
ple is given here. Let the 400x400 image in Fig. [Al] represent 
the "noisy" r — f plane with zero mean and a — 1.0. There 
is also the straight line imitating the track of a dispersed 
transient. Equation of this line in the coordinates (x,y) is 
y = ax + b where a — —2.0 and b — 400. The amplitude 
of the line is At = 1.0. After the threshold thr = a we get 
the binary image shown in Fig. IA2I The Hough transform 
of the image in Fig. IA2I is represented in Fig. [A3] where the 
peak corresponds to the straight line in Fig. IA1I The search 
for the maximum amplitude of the peak gives the estimate 
= —63.22° while the exact number is = —63.43°, see 
Fig. IA4I The origin of coordinates in the Hough plane is 
positioned at the center of the image. 



A method of detecting radio transients 13 



primary image 



After Hough transfform 



100 150 200 250 300 
number of frequency channel 






200 



angle 



Figure Al. Image of Gaussian noise with zero mean and rms = 
a — 1.0 and superposed straight line y = ax + 6, a = —2.0 and 
b = 400. The amplitude of the line is A T = 1.0. 
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Figure A2. Binary version of image in Fig. IA11 threshold thr ■ 



The useful property of the HT is in the accumulation of 
the number N of coincident points in each cell of the Hough 
image (belonging to the same straight line), thus averaging 
samples along the line on the primary image. If the initial 
SNR — At I (J ~ 1.0, as in our example, the expected SNR 
in the Hough plane will be ~ y/~N. 

Therefore, it is not only possible to estimate the disper- 
sion measure with the help of HT by the slope of a tran- 
sient's track, but also to detect a weak transient, using the 
averaging property of HT. 

Transition from the primary image (r — / plane) to the 
binary image leads to some loss. It is convenient to estimate 
this loss by comparing the probability of detection Pdet after 
averaging along the transient track with and without bina- 
rization. Fig. IA5I shows these Pdet as functions of the nor- 
malized amplitude At I (J of the track for four cases: without 
binarization and binarization with three different thresholds 
thr = 0; a; 1.5cr. The number of averaged samples along the 
track is M — 100. The probability of false alarms for all four 
curves is a = 10 -3 . 

Comparing these curves the following conclusions can be 
made: 



Figure A3. Three-dimensional presentation of the Hough trans- 
form of the image in 

Fig. El Coordinates [0,0] are in the cen- 
tre of the image. Peak corresponding to the straight line is at 
= -63.22° and p = -90. 



plane rO - 6 




Figure A4. The (p, 0) plane of the Hough transform of the image 
in Fig. IA2I Coordinates [0, 0] are in the centre of the image. The 
peak indicated by the arrow is at = —63.22° and p = —90. 
Precise value of = —63.43° 



a) there is a certain loss (~ 1.5dB) due to binarization; 

b) there is no practical differences between the curves with 
binarization in the area Pdet ~ 1.0. 
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Figure A5. Probability of detection after averaging along the 
transient track, the number of samples in the track M = 100: 
without binarization and binarization with different thresholds 
thr = 0; cr; 1.5a. The probability of false alarms a = 10 -3 . 
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